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Abstract 

Typechecking consists of statically verifying whether the output of an 
XML transformation always conforms to an output type for documents 
satisfying a given input type. In this general setting, both the input and 
output schema as well as the transformation are part of the input for the 
problem. However, scenarios where the input or output schema can be 
considered to be fixed, are quite common in practice. In the present work, 
we investigate the computational complexity of the typechecking problem 
in the latter setting. 



> 
ON 

o 

^ ! 1 Introduction 

O 

""^ | In a typical XML data exchange scenario on the web, a user community creates 

O ■ a common schema and agrees on producing only XML data conforming to that 

schema. This raises the issue of typechecking: verifying at compile time that 
every XML document which is the result of a specified query or document trans- 
its ' formation applied to a valid input document satisfies the output schema [321133] . 
^ ■ The typechecking problem is determined by three parameters: the classes 

of allowed input and output schemas, and the class of XML-transformations. 
As typechecking quickly becomes intractable 2 , 23, 25 , we focus on simple but 
practical XML transformations where only little restructuring is needed, such 
as, for instance, in filtering of documents. In this connection, we think, for 
example, of transformations that can be expressed by structural recursion 8 
or by a top-down fragment of XSLT j^J. As is customary, we abstract such 
transformations by unranked tree transducers ^JJEIj. As schemas, we adopt the 

*An extended abstract of a part of this paper appeared as Section 3 in reference 1221 in the 
ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2004. 
t Corresponding author. Email: wim.martens@uhasselt.be 
■fEmail: frank.neven@uhasselt.be 
§Email: marc.gyssens@uhasselt.be 



1 



usual Document Type Definitions (DTDs) and their robust extensions: regular 
tree languages El or > equivalently, specialized DTDs |2HH2j- The latter 
serve as a formal model for XML Schema |3(J) . 

Our work studies sound and complete typechecking algorithms, an approach 
that should be contrasted with the work on general-purpose XML programming 
languages like XDuce and CDuce @], for instance, where the main objective 
is fast and sound typechecking. The latter kind of typechecking is always incom- 
plete due to the Turing-completeness of the considered XML-transformations. 
That is, it can happen that type safe transformations are rejected by the type- 
checker. As we only consider very simple transformations which are by no means 
Turing-complete, it makes sense to ask for complete algorithms. 

The typechecking scenario outlined above is very general: both the schemas 
and the transducer are determined to be part of the input. However, for some 
exchange scenarios, it makes sense to consider the input and/or output schema 
to be fixed when transformations are always from within and/or to a specific 
community. Therefore, we revisit the various instances of the typechecking 
problem considered in |2.'ij and determine the complexity in the presence of fixed 
input and/or output schemas. The main goal of this paper is to investigate to 
which extent the complexity of the typechecking problem is lowered in scenarios 
where the input and/or output schema is fixed. An overview of our results is 
presented in Tabled 

The remainder of the paper is organized as follows. In SectionEl we discuss 
related work. In Section [31 we provide the necessary definitions. In Section 21 
we discuss typechecking in the restricted settings of fixed output and/or input 
schemas. The results are summarized in Table El We obtain several new cases 
for which typechecking is in polynomial time: (i) when the input schema is fixed 
and the schemas are DTDs with SL-expressions; (ii) when the output schema 
is fixed and the schemas are DTDs with NFAs; and (Hi) when both the input 
and output schemas are fixed and the schemas are DTDs using DFAs, NFAs, or 
SL-expressions. We conclude in Section 

2 Related Work 

The research on typechecking XML transformations was initiated by Milo, Su- 
ciu, and Vianu .25 . They obtained the decidability for typechecking of trans- 
formations realized by fc-pebble transducers via a reduction to satisfiability of 
monadic second-order logic. Unfortunately, in this general setting, the latter 
non-elementary algorithm cannot be improved |25|. Interestingly, typechecking 
of fc-pebble transducers has recently been related to typechecking of composi- 
tions of macro tree transducers ^2 • Alon et al. P] El investigated typechecking 
in the presence of data values and show that the problem quickly turns undecid- 
able. As our interest lies in formalisms with a more manageable complexity for 
the typechecking problem, we choose to work with XML transformations that 
are much less expressive than fc-pebble transducers and that do not change or 
use data values in the process of transformation. 
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A problem related to typechecking is type inference |241 125] . This problem 
consists in constructing a tight output schema, given an input schema and a 
transformation. Of course, solving the type inference problem implies a solu- 
tion for the typechecking problem, namely checking containment of the inferred 
schema into the given one. However, characterizing output languages of trans- 
formations is quite hard [2H|- For this reason, we adopt different techniques for 
obtaining complexity upper bounds for the typechecking problem. 

The transducers considered in the present paper are restricted versions of 
the DTL-programs, studied by Maneth and Neven They already obtained 
a non-elementary upper bound on the complexity of typechecking (due to the 
use of monadic second-order logic in the definition of the transducers). Re- 
cently, Maneth et al. considered the typechecking problem for an extension of 
DTL-programs and obtained that typechecking was still decidable |2U]. Their 
typechecking algorithm, like the one of [23, is based on inverse type inference. 
That is, they compute the pre-image of all ill-formed output documents and 
test whether the intersection of the pre-image and the input schema is empty. 
Tozawa considered typechecking with respect to tree automata for a fragment 
of top-down XSLT [31]. He uses a more general framework, but he was not 
able to derive a bound better than double-exponential on the complexity of his 
algorithm. 

Martens and Neven investigated polynomial time fragments of the type- 
checking problem by putting syntactical restrictions on the tree transducers, 
and making them as general as possible 21 . Here, tractability of the type- 
checking problem is obtained by bounding the deletion path width of the tree 
transducers. The deletion path width is a notion that measures the number of 
times that a tree transducer copies part of its input. In particular, it also gives 
rise to tractable fragments of the typechecking problem where the transducer is 
allowed to delete in a limited manner. 

3 Preliminaries 

In this section we provide the necessary background on trees, automata, and 
tree transducers. In the following, E always denotes a finite alphabet. 

By N we denote the set of natural numbers. A string w — a\ ■ ■ ■ a n is 
a finite sequence of E-symbols. The set of positions, or the domain, of w is 
Dom(w) = {1, ...,rc.}. The length of w, denoted by \w\, is the number of 
symbols occurring in it. The label of position i in w is denoted by l&b w (i). 
The size of a set 5", is denoted by \S\. 

As usual, a nondeterministic finite automaton (NFA) over E is a tuple N = 
(Q, E, 6, 1, F) where Q is a finite set of states, S : Q x E — > 2*3 is the transition 
function, / C Q is the set of initial states, and F C Q is the set of final states. A 
run p of N on a string w € E* is a mapping from Dom(w) to Q such that p(l) G 
%,lab w (l)) for qel, and for i = 1, . . . , \w\ - I, p(i + 1) E S(p(i), lab w (i + 1)). 
A run is accepting if p(\w\) G F. A string is accepted if there is an accepting 
run. The language accepted by N is denoted by L(N). The size of N is defined 
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as |Q| + |S| +E,eQ,oea l%> a )l- 

A deterministic finite automaton (DFA) is an NFA where (i) I is a singleton 
and (ii) \8{q,a)\ < 1 for all q 6 Q and a e S. 



3.1 Trees and Hedges 

It is common to view XML documents as finite trees with labels from a finite 



alphabet S. Figures l(a] 



and 



1(b) give an example of an XML document to- 



gether with its tree representation. Of course, elements in XML documents can 
also contain references to nodes. But, as XML schema languages usually do 
not constrain these nor the data values at leaves, it is safe to view schemas as 
simply defining tree languages over a finite alphabet. In the rest of this section, 
we introduce the necessary background concerning XML schema languages. 



<store> 
<dvd> 

<title> "Amelie" </title> 

<price> 17 </price> 
</dvd> 
<dvd> 

<title> "Goodbye, Lenin!" </title> 

<price> 20 </price> 
</dvd> 
<dvd> 

<title> "Pulp Fiction" </title> 
<price> 11 </price> 
<discount> 6 </discount> 
</dvd> 
</store> 

(a) An example XML document. 




title price title price title price discount 

I i I i I i I 

Amelie" 17 "Good bye, Lenin!" 20 "Pulp Fiction" 11 6 

(b) Its tree representation with data values. 



Figure 1: An example of an XML document and its tree representation. 

The set of unranked H-trees, denoted by 7^, is the smallest set of strings over 
S and the parenthesis symbols "(" and ")" such that, for a € £ and w € 7^*, 
a(w) is in T^. So, a tree is either e (empty) or is of the form a(t\ ■ ■ ■ t n ) where 
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title price title price title price discount 
(1 1) (12) (21) (2 2) (31) (3 2) (3 3) 

(a) The tree of Figure f! (b) | without data values. The nodes are 
annotated next to the labels, between brackets. 



store (1) 

7 

dvd (12) 

title price title price title price discount 
(111) (1 12) (121) (122) (131) (132) (133) 

(b) Tree of Figure |2(a)| viewed as a hedge. The nodes are annotated 
next to the labels, between brackets. 




Figure 2: The document of Figure ^ without data values, viewed as a tree and 
as a hedge. 



each ti is a tree. In the tree a(t\ ■ ■ ■ t n ), the subtrees t±, . . . , t n are attached to 
a root labeled a. We write a rather than a(). Note that there is no a priori 
bound on the number of children of a node in a E-tree; such trees are therefore 
unranked. For every t € Tjjj the set of tree-nodes of t, denoted by Domx(<), is 
the set defined as follows: 

(i) if t = e, then Dom^(i) = 0; and, 

(ii) if t = a(fi ■ • ■ f n ), where each ti 6 7s, then DomT(t) = {e} UU" =1 {iu | u S 
Domr(ti)}- 



Figure 2(a) contains a tree in which we annotated the nodes between brackets. 
Observe that the n child nodes of a node u are always ul, . . . , un, from left to 
right. The label of a node u in the tree < = a(t\ ■ ■ -t n ), denoted by lab^(w), is 
defined as follows: 

(i) if u = e, then lab^(u) = a; and, 

(ii) if u — iu', then lab^(u) = lab^(u'). 

We define the depth of a tree t, denoted by depth(t), as follows: if t = e, then 
dcpth(i) = 0; and if t = a(t\ ■ ■ - tn), then depth(i) = max{depth(ti) | 1 < i < 
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n} + 1. In the sequel, whenever we say tree, we always mean E-tree. A tree 
language is a set of trees. 

A hedge is a finite sequence of trees. Hence, the set of hedges, denoted by 
We, equals T£. For every hedge h G We, the set of hedge-nodes of h, denoted 
by Dom# (/i), is the subset of N* defined as follows: 

(i) if h — e, then Dom# (h) — 0; and, 

(ii) if h = tf-t n and each tj G 7e, then Dornj/(/i) = lj™ =1 {iu | u G 
Dom T (£;)}. 

The to&eZ of a node u = iu 1 in the hedge h = t\ ■ ■ -t n , denoted by lab^(u), is 
defined as lab^(u) = lab^(u'). Note that the set of hedge- nodes of a hedge 
consisting of one tree is different from the set of tree-nodes of this tree. For 



example: if the tree in Figure 2(a) were to represent a single-tree hedge, it 
would have the set o f hedg e-nodes {1, 11, 12, 13, 111, 112, 121, 122, 131, 132, 133}, 
as shown in Figure 2(b) The depth of the hedge h = t\---t n , denoted by 
depth(/i), is defined as max{depth(ii) | i — 1, . . . , n}. For a hedge h — t\ ■ ■ ■ t n , 
we denote by top(h) the string obtained by concatenating the root symbols of 
all Us, that is, lab^(l) • • • lab^(n). 

In the sequel, we adopt the following conventions: we use t,ti,t2, • ■ • to 
denote trees and h, hi, h 2 , ■ ■ ■ to denote hedges. Hence, when we write h — 
t\ - ■ - t n we tacitly assume that all ij's are trees. We denote Domy and Dom# 
simply by Dom, and we denote labr and lab# by lab when it is understood 
from the context whether we are working with trees or hedges. 



3.2 DTDs and Tree Automata 

We use extended context-free grammars and tree automata to abstract from 
DTDs and the various proposals for XML schemas. We parameterize the defi- 
nition of DTDs by a class of representations M. of regular string languages such 
as, for instance, the class of DFAs (Deterministic Finite Automata) or NEAs 
(Non-deterministic Finite Automata). For M £ M, we denote by L(M) the set 
of strings accepted by M. We then abstract DTDs as follows. 

Definition 3.1. Let M be a class of representations of regular string languages 
over E. A DTD is a tuple (d, Sd) where d is a function that maps S-symbols to 
elements of M. and Sd G £ is the start symbol. 

For convenience of notation, we denote (d, Sd) by d and leave the start 
symbol Sd implicit whenever this cannot give rise to confusion. A tree t sat- 
isfies d if (i) lab*(e) = Sd and, (ii) for every u G Dom(i) with n children, 
lab*(ul) • • •lab'(un) G L(d(lab'(w))). By L(d) we denote the set of trees satis- 
fying d. 

Given a DTD d, we say that a E-symbol a occurs in d(b) when there exist 
E-strings wi and wi such that W\aui2 G L(d(b)). We say that a occurs in d if a 
occurs in d(b) for some b G E. 
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We denote by DTD(A^) the class of DTDs where the regular string languages 
are represented by elements of M. The size of a DTD is the sum of the sizes of 
the elements of M. used to represent the function d. 

Example 3.2. The following DTD (d, store) is satisfied by the tree in Fig- 



ure 2(a. 



d(store) = dvd dvd* 
d(dvd) = title price (discount + e) 

This DTD defines the set of trees where the root is labeled with "store"; the 
children of "store" are all labeled with "dvd"; and every "dvd" -labeled node 
has a "title" , "price" , and an optional "discount" child. 

In some cases, our algorithms are easier to explain on well-behaved DTDs 
as considered next. A DTD d is reduced if, for every symbol a that occurs in 
d, there exists a tree t £ L(d) and a node u £ Dom(t) such that lab (it) = a. 
Hence, for example, the DTD (d, a) where d(a) = a is not reduced. Reducing a 
DTD(DFA) is in ptime, while reducing a DTD(SL) is in coNP (see the Appendix, 
CorrollaryO}. Here, SL is a logic as defined next. 

To define unordered languages, we make use of the specification language 
SL inspired by (23 an d also used in The syntax of this language is as 

follows: 

Definition 3.3. For every a £ T, and natural number i, a =z and a- 1 are atomic 
SL- formulas; "true" is also an atomic SL- formula. Every atomic SL-formula is 
an SL-formula and the negation, conjunction, and disjunction of SL-formulas 
are also SL-formulas. 

A string w over S satisfies an atomic formula a =l if it has exactly i occur- 
rences of a; w satisfies a—' if it has at least i occurrences of a. Furthermore, 
"true" is satisfied by every string. Satisfaction of Boolean combinations of 
atomic formulas is defined in the obvious way. 1 By w \= cj>, we denote that w 
satisfies the SL-formula <j>. 

As an example, consider the SL-formula -■(discount- 1 A -iprice- 1 ). This 
expresses the constraint that a discount can only occur when a price occurs. 
The size of an SL-formula is the number of symbols that occur in it, that is, 
E-symbols, logical symbols, and numbers (every i in a =% or a— % is written in 
binary notation). 

We recall the definition of non-deterministic tree automata from |SJ. We 
refer the unfamiliar reader to |26j for a gentle introduction. 

Definition 3.4. A nondeterministic tree automaton (NTA) is a 4-tuple B = 
(Q,£,<5, F), where Q is a finite set of states, F C Q is the set of final states, 
and i5:QxE^ 2 Q " is a function such that S(q, a) is a regular string language 
over Q for every a £ E and q £ Q. 

lr The empty string is obtained as AagE a= ° ant ^ * ne em pty set as -i true. 
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For simplicity, we often denote the regular languages in B's transition func- 
tion by regular expressions. 

A run of B on a tree t is a labeling A : Dom(i) — > Q such that, for every 
v G Dom(i) with n children, A(ul) • ■ • X(vn) G (5(A(w), lab (u)). Note that, when 
v has no children, the criterion reduces to e G 8(X(v), lab'(?;)). A run is accepting 
if the root is labeled with an accepting state, that is, A(e) G F. A tree is accepted 
if there is an accepting run. The set of all accepted trees is denoted by L(B) 
and is called a regular tree language. 

A tree automaton is bottom-up deterministic if, for all q, q' G Q with q q' 
and a G E, S(q, a) D £((/', a) = 0. We denote the set of bottom-up deterministic 
NTAs by DTA. 

Example 3.5. We give a bottom-up deterministic tree automaton B = (Q, E, 
S, F) which accepts the parse trees of well-formed Boolean expressions that are 
true. Here, the alphabet E is {A, V, true, false}. The states set Q contains 
the states (/true and (/f a i so , and the accepting state set F is the singleton {(/true}- 
The transition function of B is defined as follows: 

<K<7true, true) = e. We assign the state (/true to leafs with label "true". 

S (ftaise) false) = e. We assigns the state qf a \ sc to leafs with label "false". 



<5(gtruc, A ) 


— 9truo9true- 




Chaise, A ) 


= (tftrue + 9false)*9false(9true " 


f (/false)* 


6(gtruc, v) 


= (?truo + <7false)*9truc(9true " 


" (/false)*- 


liaise, V ) 


= Qfalse9f a i so - 




<5(gtrue, "0 


= (/false- 




£(?falsc, ~0 


— ytrue* 





Consider the tree t depicted in Figure 3(a) The unique accepting run r of 
B on t can be graphically represented as shown in Figure 3(b) Formally, the 
run of B on t is the function A : Dom(i) -4Q:uh> lab r (w). Note that B is a 
DTA. 

As for DTDs, we parameterize NTAs by the formalism used to represent the 
regular languages in the transition functions 8{q,a). So, for a class of repre- 
sentations of regular languages M : we denote by NTA(A^) the class of NTAs 
where all transition functions are represented by elements of M. The size of 
an automaton B then is \Q\ + |E| + J2 q <£Q aes I'KSi a )\- Here, by \6(q, a)\, we 
denote the size of the automaton accepting S(q,a). Unless explicitly specified 
otherwise, S(q, a) is always represented by an NFA. 

In our proofs, we will use reductions from the following decision problems 
for string automata: 

Emptiness: Given an automaton A, is L(A) = 0? 
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false -i false true false false true 
false 

(a) The tree t. 




yfalse ^7truc ^falsc 9truc Qfalsc ?falsc 9true 



(b) Graphical representation of the run r of B on f . 

Figure 3: Illustrations for Example 13. 51 
Universality: Given an automaton A, is L(A) — £*? 

Intersection emptiness: Given the automata Ai, . . . ,A n , is L{A\) (1 ■ ■■ H 
L(A n ) = 0? 

The corresponding decision problems for tree automata are defined analogously. 

In the Appendix, we show that the following statements hold over the al- 
phabet {0,1} (Corollary [3): 

1. Intersection emptiness of an arbitrary number of DFAs is PSPACE-hard. 

2. Universality of NFAs is PSPACE-hard. 

Over the alphabet {0, 1, 0', 1'}, the following statement holds: 

(3) Intersection emptiness of an arbitrary number of TDBTAs is exptime- 
hard. 

3.3 Transducers 

We adhere to transducers as a formal model for simple transformations cor- 
responding to structural recursion |S] and a fragment of top-down XSLT. As 
in the abstraction focuses on structure rather than on content. We next 
define the tree transducers used in this paper. To simplify notation, we restrict 
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ourselves to one alphabet. That is, we consider transducers mapping E-trees to 
E-trees. 2 

For a set Q, denote by Hs(Q) (respectively 7h(Q)) the set of E-hedges 
(respectively trees) where leaf nodes are labeled with elements from E U Q 
instead of only E. 

Definition 3.6. A tree transducer is a 4-tuple T = (Q,E, q°,R), where Q is 
a finite set of states, E is the input and output alphabet, q° G Q is the initial 
state, and R is a finite set of rules of the form (q, a) — > h, where a G E, q G Q, 
and h G 7"fe(Q). When q — q° , h is restricted to be either empty, or consist of 
only one tree with a E-symbol as its root label. 

The restriction on rules with the initial state ensures that the output is 
always a tree rather than a hedge. Transducers are required to be deterministic: 
for every pair (q, a), there is at most one rule in R. 

The translation defined by a tree transducer T — (Q,T,,q°,R) on a tree t 
in state q, denoted by T q (t), is inductively defined as follows: if t = e then 
T q (t) := e; if t = a(h •••*„) and there is a rule (q,a) -> h G R then T q (t) 
is obtained from h by replacing every node u in h labeled with state p by the 
hedge T p (ti) ■ ■ ■ T p (t n ). Note that such nodes u can only occur at leaves. So, h 
is only extended downwards. If there is no rule (q, a) — > h G R then T 9 (i) := e. 
Finally, the transformation of t by T, denoted by T(t), is defined as T 9 (t), 
interpreted as a tree. 

For a G E, q G Q and (q,a) — > h G i?, we denote ft- by rhs(q,a). If g and 
a are not important, we say that h is an rhs. The size of T is |Q| + |E| + 
S q eQ aes r hs(<7, where |rhs(g,a)| denotes the number of nodes in rhs(g,a). 
In the sequel, we always use p,pi,p2, ■ ■ ■ and Q,Qi,Q2, • ■ ■ to denote states. 

Let q be a state of tree transducer T and a G E. We then define qr[o] ■= 
top(T 9 (a)). For a string w = a\ • ■ ■ a n , we define qT[w) :— qT [ai] • • • qT[a n }- In 
the sequel, we leave T implicit whenever T is clear from the context. 

We give an example of a tree transducer: 

Example 3.7. Let T = (Q, E,p, R) where Q — {p, q}, E = {a, b, c, d, e}, and R 
contains the rules 



Note that the right-hand side of (q, a) — > c p is a hedge consisting of two trees, 
while the other right-hand sides consist of only one tree. 

Our tree transducers can be implemented as XSLT programs in a straightfor- 
ward way. For instance, the XSLT program equivalent to the above transducer 
is given in Figure ^ (we assume the program is started in mode p) . 

2 In general, of course, one can define transducers where the input alphabet differs from 
the output alphabet. 



(p, a) -> d(e) 
(q,a) -> cp 



(p,b)^d(q) 
(q, b) -> c(p q) 
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<xsl: template match="a" mode ="p"> 
<d> 

<e/> 
</d> 
</xsl : template> 

<xsl:template match="b" mode ="p"> 
<d> 

<xsl : apply-templates mode="q"/> 
</d> 
</xsl : template> 

<xsl:template match="a" mode ="q"> 
<c/> 

<xsl : apply-templates mode="p"/> 
</xsl : template> 

<xsl:template match="b" mode ="q"> 
<c> 

<xsl : apply-templates mode="p"/> 
<xsl : apply-templates mode="q"/> 

</c> 
</xsl : template> 



Figure 4: The XSLT program equivalent to the transducer of Example 13. 71 



Example 3.8. Consider the tree t shown in Figure |5(a)| In Figure 5(b) we 
give the translation of t by the transducer of Example 13.71 In order to keep 
the example simple, we did not list T q (e) and T p (e) explicitly in the process of 
translation. 

We discuss two important features of tree transducers: copying and deletion. 
In Example 13. 71 the rule (q, b) — > c(pq) copies the children of the current node 
in the input tree twice: one copy is processed in state p and the other in state 
q. The symbol c is the parent node of the two copies. So, one could say that 
the current node is translated in the new parent node labeled c. The rule 
(q, a) cp copies the children of the current node only once. However, no 
parent node is given for this copy. So, there is no node in the output tree that 
can be interpreted as the translation of the current node in the input tree. We 
therefore say that it is deleted. For instance, T q (a(b)) = cd where d corresponds 
to b and not to a. 

We define some relevant classes of transducers. A transducer is non-deleting 
if no states occur at the top-level of any rhs. We denote by T n d the class of 
non-deleting transducers and by 7^ the class of transducers where we allow 
deletion. Furthermore, a transducer T has copying width k if there are at most 
A; occurrences of states in every sequence of siblings in an rhs. For instance, 
the transducer in Example 13.71 has copying width 2. Given a natural number 
k, which we will leave implicit, we denote by 7& c the class of transducers of 
copying width k. The abbreviation "be" stands for bounded copying. We denote 
intersections of these classes by combining the indexes. For instance, T n d,bc is 
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TP(t) 
I 

T«(b) Ti{b(ab)) T 9 (a(6)) 



b b a 
abb 

(a) The tree t of Exam- 
ple |3U 



c TP(b) 

T p (a) " TP(b) T«(a) T«(o) 
I 



c c c d 
dace 



(b) The translation of t by the transducer T of Exam- 
ple o 



Figure 5: A tree and its translation. 



the class of non-deleting transducers with bounded copying. When we want to 
emphasize that we also allow unbounded copying in a certain application, we 
write, for instance, T n d,uc instead of T n d- 



3.4 The Typechecking Problem 

Definition 3.9. A tree transducer T typechecks with respect to to an input tree 
language Sin and an output tree language S ou t, if T(t) G S ou t for every t G Si n - 

We now define the problem central to this paper. 

Definition 3.10. Given Si n , S ou t, and T, the typechecking problem consists in 
verifying whether T typechecks with respect to S- ln and iSout- 

We parameterize the typechecking problem by the kind of tree transducers 
and tree languages we allow. Let T be a class of transducers and iS be a repre- 
sentation of a class of tree languages. Then TC[T,«S] denotes the typechecking 
problem where T £ T and S{ n , S out G S. Examples of classes of tree languages 
are those defined by tree automata or DTDs. Classes of transducers are dis- 
cussed in the previous section. The complexity of the problem is measured in 
terms of the sum of the sizes of the input and output schemas Si n and S ou t and 
the transducer T. 

Table ^ summarizes the results obtained in [231 - Unless specified otherwise, 
all problems are complete for the mentioned complexity classes. In the set- 



12 





NTA 


DTA 


DTD(NFA) 


DTD(DFA) 


DTD(SL) 


d,uc 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


nd,uc 


EXPTIME 


EXPTIME 


PSPACE 


PSPACE 


CONP 


nd,bc 


EXPTIME 


in EXPTIME 
PSPACE-hard 


PSPACE 


PTIME 


CONP 



Table 1: Results of [23] (upper and lower bounds). The top row shows the 
representation of the input and output schemas and the left column shows the 
class of tree transducer: "d", "nd", "uc", and "be" stand for "deleting", "non- 
deleting" , "unbounded copying" , and "bounded copying" respectively. 

ting of |23]> typechecking is only tractable when restricting to non-deleting and 
bounded copying transducers in the presence of DTDs with DFAs. 

Recall that, in this article, we are interested in variants of the typechecking 
problem where the input and/or output schema is fixed. We therefore introduce 
some notations that are central to the paper. We denote the typechecking 
problem where the input schema, the output schema, or both are fixed by 
TC^TjS], TC°[T,S], and TC io [T,5], respectively. The complexity of these 
subproblems is measured in terms of the sum of the sizes of the input and 
output schemas S m and 5 ou t, an d the transducer T, minus the size of the fixed 
schema(s). 

4 Main Results 



fixed 


TT 


NTA 


DTA 


DTD(NFA) 


DTD(DFA) 


DTD(SL) 


in, out, 
in + out 


d,uc 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


d,bc 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


EXPTIME 


in 


nd,uc 


EXPTIME 


EXPTIME 


PSPACE 


PSPACE 


in PTIME 


nd,bc 


EXPTIME 


EXPTIME 


PSPACE 


NL 


in PTIME 


out 


nd,uc 


EXPTIME 


EXPTIME 


PSPACE 


PSPACE 


CONP 


nd,bc 


EXPTIME 


EXPTIME 


PTIME 


PTIME 


CONP 


in + out 


nd,uc 


EXPTIME 


EXPTIME 


NX 


NX 


NL 


nd,bc 


EXPTIME 


EXPTIME 


NX 


NL 


NL 



Table 2: Complexities of the typechecking problem in the new setting (upper and 
lower bounds). The top row shows the representation of the input and output 
schemas, the leftmost column shows which schemas are fixed, and the second 
column to the left shows the class of tree transducer: "d", "nd", "uc", and 
"be" stand for "deleting" , "non-deleting" , "unbounded copying" , and "bounded 
copying" respectively. In the case of deleting transformations, the different 
possibilities are grouped as all complexities coincide. 
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As argued in the Introduction, it makes sense to consider the input and/or 
output schema not as part of the input for some scenarios. From a complexity 
theory point of view, it is important to note here that the input and/or output 
alphabet then also becomes fixed. In this article, we revisit the results of (2H| 
from that perspective. 

The results are summarized in Tabled As some results already follow from 
proofs in [33], we printed the results requiring a new proof in bold. The entries 
where the complexity was lowered (assuming that the complexity classes in 
question are different) are underlined. Again, all problems are complete for the 
mentioned complexity classes unless specified otherwise. 

We discuss the obtained results: for non-deleting transformations, we get 
three new tractable cases: (i) fixed input schema, unbounded copying, and 
DTD(SL)s; (ii) fixed output schema, bounded copying and DTD(NFA)s; and, 
(Hi) fixed input and output, unbounded copying and all DTDs. It is striking, 
however, that in the presence of deletion or tree automata (even deterministic 
ones) typechecking remains EXPTiME-hard for all scenarios. 

Mostly, we only needed to strengthen the lower bound proofs of \2'6\ . 

4.1 Deletion: Fixed Input Schema, Fixed Output Schema, 
and Fixed Input and Output Schema 

The exptime upper bound for typechecking already follows from [231 - There- 
fore, it remains to show the lower bounds for TC lo [T d: & c ,DTD(DFA)] and TC lo [T d: & c ,DTD(SL)], 
which we do in Theorem 14.11 In fact, if follows from the proof that the lower 
bounds already hold for transducers with copying width 2. 

We require the notion of top-down deterministic binary tree automata in the 
proof of Theorem 14.11 A binary tree automaton (BTA) is a non-deterministic 
tree automaton B = (Q, E, S,F) operating on binary trees. These are trees 
where every node has zero, one, or two children. We assume that the alphabet 
is partitioned in internal labels and leaf labels. When a label a is an internal 
label, the regular language S(q,a) only contains strings of length one or two. 
When a is a leaf label, the regular language S(q, a) only contains the empty 
string. A binary tree automaton is top-down deterministic if (i) F is a singleton 
and, (ii) for every q,q' £ Q with q ^ q' and a G E, S(q,a) contains at most 
one string. We abbreviate "top-down deterministic binary tree automaton" by 
TDBTA. 

Theorem 4.1. 1. TC io [T d , bc , DTD(DFA)] is exptime- complete; and 

2. TC io [Td.bc, DTD(SL)] is exptime- complete. 

Proof. The exptime upper bound follows from Theorem 11 in [231 • We proceed 
by proving the lower bounds. 

We give a LOGSPACE reduction from the intersection emptiness problem of 
an arbitrary number of top-down deterministic binary tree automata (TDB- 
TAs) over the alphabet E = {0, 1, 0', 1'}. The intersection emptiness problem of 
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# 
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I 

A 

Figure 6: Structure of the trees defined by the input schema in the proof of 
Theorem PI 

TDBTAs over alphabet {0, 1,0', 1'} is known to be EXPTiME-hard (cfr. Corol- 
lary^3) in the Appendix). 

For i = l,...,n, let A { = (Q i} S, 5,, {start*}) be a TDBTA, with S = 
{0, 1, 0', 1'}. Without loss of generality, we can assume that the state sets Qt are 
pairwise disjoint. We call and 1 internal labels and 0' and 1' leaf labels. In our 
proof, we use the markers and V to denote that a certain node is a left or a 
right child. Formally, define := {at \ a £ £} and S r := {a r | a 6 £}. We use 
symbols from Hi and E r for the left and right children of nodes, respectively. 

We now define a transducer T and two DTDs a\ n and d ou t such that HILi L(Ai) — 
if and only if T typechecks with respect to d; n and d out . In the construction, 
we exploit the copying power of transducers to make n copies of the input tree: 
one for each Ai. By using deleting states, we can execute each Ai on its copy 
of the input tree without producing output. When an A4 does not accept, we 
output an error symbol under the root of the output tree. The output DTD 
should then only check that an error symbol always appears. A bit of care 
needs to be taken, as a bounded copying transducer can not make an arbitrary 
number of copies of the input tree in the same rule. The transducer therefore 
goes through an initial copying phase where it repeatedly copies part of the 
input tree twice, until there are (at least) n copies. The transducer remains in 
the copying phase as long as it processes special symbols "#" . The input trees 
are therefore of the form as depicted in Figure EJ In addition, the transducer 
should verify that the number of ^-symbols in the input equals [log n~\ . 

The input DTD (di n ,s), which we will describe next, uses the alphabet 

U E r U {s,#}, and defines all trees of the form as described in Figure El 
where s and # are alphabet symbols, and every internal node of t (which is 
depicted in Figure has one or two children. When a node is an only child, it 
is labeled with an element of S^. Otherwise, it is labeled with an element of 
or an element of E r if it is a left child or a right child, respectively. In this way, 



15 



the transducer knows whether a node is a left or a right child by examining the 
label. The root symbol of t is labeled with a symbol from S^. Furthermore, all 
internal nodes of t are labeled with labels in {0^,0 r , le, l r } and all leaf nodes 
are labeled with labels in {0^, 0^., l' r }. As explained above, we will use the 
sequence of #-symbols to make a sufficient number of copies of t. 
The input DTD (d; n ,s) is defined as follows: 

• d in (s) = # + £ + l e ; 

• M#) = # + e + l e ; 

• for each a G {0e, 1^, r , l r }, 

dm(a) = (Oe + It + 0' e + l' t ) + {0t + U + 0' e + lj)(0 r + l r + + l' r ); and, 

• for each a G {0^, l' e , 0' r , l' r }, d[ n (a) = e. 

Obviously, (dj n , s) can be expressed as a DTD(DFA). It can also be expressed 
as a DTD(SL), as follows 

M«) = ((MOF 1 ] v ^[lj 1 ] v vKoT 1 } v ¥>[(ii) =1 ])) 
e(( ¥ )[or 1 ] v ^[lf] v ^) =1 ] v ^r 1 ]) 

a (^[or 1 ] v ^[i^ 1 ] v ^r 1 ] v <^[(i;) =l ]))) 

A S =° A #=° 
for every a G {0^, 1^, r , l r }, where 

• © denotes the "exclusive or" ; 

• for every i G {i,r} and x G {0j, lj,0£, 1-}, <p[x =1 ] denotes the conjunction 

a A y =0 )- 

s/efOi.ii.o^iaXfx} 

Notice that the size of the SL-formula expressing d[ n (a) is constant. 

We construct a tree transducer T — (Qt, £t, <Zcopyi Rt)- The alphabet of T 
is Et = S( U S r U {s, #, error, ok}. The state set Qt is defined to be the set 
{q e , q r | q G Qi, i € {1, . . . , n}}. The transducer will use [logn] special copying 
states ql py to make at least n copies of the input tree. To define Qt formally, 
we first introduce the notation D(k), for k = 0, . . . , [logn]. Intuitively, D{k) 
corresponds to the set of nodes of a complete binary tree of depth k + 1. For 
example, D(l) = {£,0,1} and D(2) = {£,0,1,00,01,10,11}. The idea is that, 
if i € D(k) \ D(k — 1), for k > 0, then i represents the binary encoding of a 
number in {0,...,2 fc — 1}. Formally, if k = 0, then D(k) = {£}; otherwise, 
D(k) = D(k - 1) U \J j=01 {ij | i £ D(k - 1)}. The state set Q T is then the 
union of the sets Q e = {q l q G Qj, 1 < j < n}, Q r = {q r \ q G Qj, 1 < j < n}, 
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the set {ql opy | j G D(\logn})} and the set {startj | n + 1 < j < 2 Ti°s ™1 } . Note 
that the last set can be empty. It only contains dummy states translating any 
input to the empty string. 

We next describe the action of the tree transducer T. Roughly, the operation 
of T on the input s(#(#(- • • #(0))) can be divided in two parts: (i) copying 
the tree t a sufficient number of times while reading the #-symbols; and, (ii) 
simulating one of the TDBTAs on each copy of t. The tree transducer outputs 
the symbol "error" when one of the TDBTAs rejects t, or when the number 
of #-symbols in its input is not equal to [log n] . Apart from copying the root 
symbol s to the output tree, T only writes the symbol "error" to the output. 
Hence, the output tree always has a root labeled s which has zero or more 
children labeled "error". The output DTD, which we define later, should then 
verify whether the root has always one "error" -labeled child. 

Formally, the transition rules in Rt arc defined as follows: 

• (^copy 5 ) ~* s (9copy9co P y)- This rule puts s as the root symbol of the 
output tree. 

• (<?co Py >#) 9co Py 9copy for * e D{\\ogn\ - 1) - {e} . These rules copy 
the tree t in the input at least n times, provided that there are enough 
^-symbols. 

• (9copy; if) ~ * start/I, where i G D{\\ogn\) — D{\\ogn\ — 1), and i is the 
binary representation of k. This rule starts the in-parallel simulation of the 
Afs. For i = n + 1, . . . , 2^ ogTl l , startf is just a dummy state transforming 
everything to the empty tree. 

• (^copy a ) ~ > error for a G S and i G D(\logn}). This rule makes sure that 
the output of T is accepted by the output tree automaton if there are not 
enough ^-symbols in the input. 

• (start£, if) — ► error for all k = 1, . . . , 2^° sn ^ . This rule makes sure that 
the output of T is accepted by the output tree automaton if there are too 
much #-symbols in the input. 

• {q l , a r ) — > e and (q r , afj — > e for all q G Qj, j = 1, . . . , n. This rule ensures 
that tree automata states intended for left (respectively right) children are 
not applied to right (respectively left) children. 

• (q £ 7 a£) — ► q\q2 and (q r ,a r ) — > q[q^ 2l for every q G Qi, i = 1, . . . ,n, such 
that Si(q, a) = qxqi-, and a is an internal symbol. This rule does the actual 
simulation of the tree automata Ai, i = 1, . . . , n. 

• (q £ 7 ae) — ► and {q r 1 a r ) — > gf, for every q G Qi, i = l,...,n, such 
that 5j (q, a) = q\ and a is an internal symbol. This rule does the actual 
simulation of the tree automata Ai, i = 1, . . . , n. 

• (q e ,ag) — > e and (q r ,a r ) — > e for every G Qi, i = 1, ...,n, such that 
5i(q,a) = e and a is a leaf symbol. This rule simulates accepting compu- 
tations of the Ai's. 
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• (q e , ag) — > error and (q r , a r ) — > error for every g 6 Qi, i — 1, . . . , n, such 
that Si(q,a) is undefined. This rule simulates rejecting computations of 
the Ai's. 

It is straightforward to verify that, on input s(#(#(- • ■ #(£)))), T outputs the 
tree s if and only if there are [logn] ^-symbols in the input and t € L(A\) (~1 

•••nL(4 n ). 

Finally, d ut(s) = error error*, which can easily be defined as a DTD(DFA) 
and as a DTD(SL). 

It is easy to see that the reduction can be carried out in deterministic loga- 
rithmic space, that T has copying width 2, and that d- m and d out do not depend 
onii,...,i n . □ □ 

4.2 Non-deleting: Fixed Input Schema 

We turn to the typechecking problem in which we consider the input schema as 
fixed. We start by showing that typechecking is in ptime in the case where we 
use DTDs with SL-expressions and the tree transducer is non-deleting (Theo- 
rem 14 To this end, we recall a lemma and introduce some necessary notions 
that are needed for the proof of Theorem 14.31 

For an SL-formula <fr, we say that two strings w\ and u>2 are <p- equivalent 
(denoted w\ w-i) if w\ \= <fr if and only if u>2 \= 4>- 

For a £ S and w £ £*, we denote by # a (w) the number of a's occurring in 
w. We recall Lemma 17 from |23| : 

Lemma 4.2. Let <fi be an SL-formula and let k be the largest integer occurring 
in 4>. For every w, w' 6 £*, for every a £ E, 

• if #a(w') > k when # a {w) > k, and 

• #a(™') = #aM, otherwise, 
then w =0 w' . 

For a hedge h and a DTD d, we say that h partly satisfies d if for every 
u G Dom(/i), \ab h (ul) ■ ■ ■\ab h {un) S L(d(lab' l (u))) where u has n children. 
Note that there is no requirement on the root nodes of the trees in h. Hence, 
the term "partly" . 

We are now ready to show the first PTIME result: 

Theorem 4.3. TC l [T nd , uc , DTD(SL)} is m ptime. 

Proof. Denote the tree transformation by T = (Qt, S, ^j., Rt) and the input 
and output DTDs by (c£i n ,Si n ) and (d ut,s ut), respectively. As d m is fixed, we 
can assume that d- ln is reduced. 

Intuitively, the typechecking algorithm is successful when T does not type- 
check with respect to d[ n and d ut- The outline of the typechecking algorithm 
is as follows: 
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Figure 7: Illustration of the typechecking algorithm in the proof of Thcorcm l4.3l 



1. Compute the set RP of "reachable pairs" (q,a) for which there exists a 
tree t 6 L(d- m ) and a node u € Dom(i) such that lab (u) — a and T visits 
u in state g. That is, we compute all pairs (g, a) such that cither 

• Q = Qt an d a = s- m ; or 

• (g', a') € i?P, there is a g-labeled node in rhs(g', a'), and there exists 
a string w\aw2 € din (a') for u>i,u>2 £ E*. 

2. For each such pair (g, a) and for each node v £ Dom(rhs(g, a)), test 
whether there exists a string u> 6 d- m (a) such that T q (a(w)) does not 
partly satisfy d ou t- We call w a counterexample. 

The algorithm is successful, if and only if there exists a counterexample. 

We illustrate the general operation of the typechecking algorithm in Figured 
In this figure, T visits the a-labeled node on the left in state q. Consequently, 
T outputs the hedge rhs(q, a), which is illustraded by dotted lines on the right. 
The typechecking algorithm searches for a node u in rhs(g, a) (which is labeled 
by c in the figure), such that the string of children of u is not in L(d out (c)). 

Notice that the typechecking algorithm does not assume that d out is reduced 
(recall the definition of a reduced DTD from Section EOt . We need to show that 
the algorithm is correct, that is, there exists a counterexample if and only if T 
does not typecheck with respect to d m and rf ou t. Clearly, when the algorithm 
does not find a counterexample, T typechecks with respect to dj n and d ou t ■ Con- 
versely, suppose that the algorithm finds a pair (q, a) and a string w such that 
T q (a(w)) does not partly satisfy d ont - So, since we assumed that di n is reduced, 
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there exists a tree t G L(d- ln ) and a node u G Dom(t) such that lab (it) = a and 
zt is visited by T in state q. Also, there exists a node i> in T q (a(w)), such that 
the label of u is c and the string of children of u is not in d ut(c). We argue 
that T(t) g" L(d on t). There are two cases: 

(i) if L(d ou t) contains a tree with a c-labeled node, then T(t) g" d ut since 
T q {a{w)) does not partly satisfy d on t', and 

(ii) if L(d ont ) does not contain a tree with a c-labeled node, then T(t) g" d ut 
since T(t) contains a c-labeled node. 

We proceed by showing that the algorithm can be carried out in polynomial 
time. As the input schema is fixed, step (1) of the algorithm is in polynomial 
time. Indeed, we can compute the set RP of reachable pairs (q, a) in a top-down 
manner by a straightforward reachability algorithm. 

To show that step (2) of the typechecking algorithm is in polynomial time, fix 
a tuple (g, a) that was computed in step (1) and a node u in rhs(g, a) with label b. 
Let 

zoQi z i ' ' ' QnZn be the concatenation of w's children, where all zq, . . . , z n G S* 
and q±,...,q n G Qt- We now search for a string w G S* for which w \= d- m (a), 
but for which ^offilH^i ' ' ' 1n[w\z n y= d out (b). Recall from Section that q[w) 
is the homomorphic extension of q[a] for a G S, which is top(rhs((j, a))) in the 
case of non-deleting tree transducers. 

Denote d ln (a) by <fi. Let {<zi, . . . , a s } be the different symbols occurring in 
4> and let k be the largest integer occurring in <j>. According to Lemma 14.21 
every E-string is ^-equivalent to a string of the form w — a" 11 ■ ■ ■ a" ls with 
< rrii < k + 1 for each i = 1, . . . , s. Note that there are (k + l) s such strings, 
which is a constant number, as it only depends on the input schema. For the 
following, the algorithm considers each such string w. 

Fix such a string w such that w (= <f>. For each symbol c in d out (b), the 
number # c (^o<Zi [w]zi ■ ■ ■ q n [w]z n ) is equal to the linear sum 

k{ x # Ql (w) H h k\ x # ae (w) + kf +1 x # Q£+1 (w) + k c s x #„,(«;) + fc c , 

where fe c = # c (^o • • • ^n) an d for each i = 1, . . . , s, fc| = # c (<7i [oj] • ■ • g n [oi]). We 
now must test if there exists a string w' =^ w such that Zo9i[w']zi • ■ ■ q n [w']z n \£ 
d out (b). Let a\,...,ai be the symbols that occur at least k + 1 times in «; 
and a£+i, . . . , a s be the symbols that occur at most k times in w, respectively. 
Then, deciding whether w' exists is equivalent to finding an integer solution to 
the variables x ai , . . . , x a3 for the boolean combination of linear (in) equalities 
$ = $i A -i$2> where 

• $i states that w' =</, w, that is, 

l s 

i=i j=e+i 

and 
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• $2 states that z qi(w')zi ■ ■ ■q n (w')z n |= d out (b), that is, $2 is defined by 
replacing every occurrence of c =% or c- 1 in d ou t(b) by the equation 

s 

Y^( k j xx aj ) + k c = i 
3=1 

or by 

s 

3=1 

respectively. 

In the above (in)equalities, x ai , 1 < i < s, represents the number of occurrences 
of <jj in w' . 

Finding a solution for $ now consists of finding integer values for x ai , . . . , x a3 
so that $ evaluates to true. Corollary 1771 in the Appendix shows that we can 
decide in ptime whether such a solution for $ exists. □ □ 

Theorem 4.4. TC l [T ndM , DTD(DFA)] is NLOGSPACE- complete. 

Proof. In Theorem l4.9f 2'). we prove that the problem is NLOGSPACE-hard, even 
if both the input and output schemas are fixed. Hence, it remains to show that 
the problem is in NLOGSPACE. 

Let us denote the tree transformation by T = {Qt, S, t^, Rt) and the input 
and output DTDs by (di n ,r) and d ou t, respectively. We can assume that d ln is 
reduced. 3 

Then, the typechecking algorithm can be summarized as follows: 

1. Guess a sequence of pairs (qo, do), (qi, di), . . . , (q n , a n ) in Qt X Si n , such 
that 

• (<7o,ao) = (<7T' r ); and 

• for every pair (qi,di), qt+i occurs in rhs(gi,Oi) and a,-+i occurs in 
some string in L(d- m (ai)). 

We only need to remember (g result of this step. 

2. Guess a node u in rhs(q„, a n ) — say that u is labeled with b — and test 
whether there exists a string w S d- m (a n ) such that T q (a n (w)) does not 
partly satisfy c? ut- 

The algorithm is successful if and only if w exists and, hence, the problem does 
not typechcck. 

The first step is a straightforward reachability algorithm, which is in NLOGSPACE. 
It remains to show that the second step is in NLOGSPACE. 

Let (q, a) be the pair {q n , a n ) computed in step two. Let d ou t{b) = (Qout, S, <5 ut, 
{Pi}-> {pf}) be a DFA and let k be the copying bound of T. Let z§q\Z\ ■ ■ ■ qizt 

3 Reducing d ln would be PTlME-complete otherwise, see Corollary^|in the Appendix. 
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be the concatenation of u's children, where I < k. So we want to check whether 
there exists a string w such that zoqi[w]zi ■ ■ ■ qg\w\zg is not accepted by d out (b). 
We guess w one symbol at a time and simulate in parallel i copies of d out (b) 
and one copy of d m {a). 

By S we denote the canonical extension of 6 to strings in £*. We start by 
guessing states p±, . . . ,pt of d ou t(b), where p\ = 5 ou t(pi, zq), and keep a copy of 
these on tape, to which we refer asp'j, . . . ,p' e . Next, we keep on guessing symbols 
c of w, whereafter we replace each pi by 5 ont {Pii <7i(c)). The input automaton 
obviously starts in its initial state and is simulated in the straightforward way. 

The machine non-deterministically stops guessing, and checks whether, for 
each i = 1, ...,£- 1, S out (pi, Zi) = p' r+l and S out (pi, z e ) = p F . For the input 
automaton, it simply checks whether the current state is the final state. If the 
latter tests are positive, then the algorithm accepts, otherwise, it rejects. 

We only keep 2£ + 1 states on tape, which is a constant number, so the 
algorithm runs in NLOGSPACE. □ □ 

Theorem 4.5. 1. TC l [T nd ^ uc , DTD (D FA )] is PSPACE-complete; and 

2. TC l [T nd . bC! DTD(NFA)} is pspace- complete. 

Proof. In , it was shown that both problems are in pspace. We proceed by 
showing that they are also PSPACE-hard. 

(1) We reduce the intersection emptiness problem of an arbitrary number of 
deterministic finite automata with alphabet {0, 1} to the typechecking problem. 
This problem is known to be PSPACE-hard, as shown in Corollary I72T1') in the 
Appendix. Our reduction only requires logarithmic space. We define a trans- 
ducer T = {Qti {0, 1, #0) • • • j #n}j 9t>-^t) and two DTDs d la and d out such 
that T typechecks with respect to d[ n and d out if and only if DlLi L{Mi) = 0. 

The DTD (di n ,s) defines trees of depth two, where the string formed by 
the children of the root is an arbitrary string in {0, 1}*, so di n (s) = (0 + 1)*. 
The transducer makes n copies of this string, separated by the delimiters $=f. 
Qt = {q, <ZtI an d Rt contains the rules (g§,, s) — > s(#o9#i<Z • ■ • #n-i<Z#n) and 
(q,a) — > a, for every a £ S. Finally, (d out ,s) defines a tree of depth two as 
follows: 

dout(s) = {#0»l#lW2#2 ' ' ' #n-lW„#„ | 

3j G {1, . . . , n} such that Mj does not accept Wj}. 

Clearly, d out (s) can be represented by a DFA whose size is polynomial in the sizes 
of the Mi's. Indeed, the DFA just simulates every Mj on the string following 
#i-x, until it encounters It then verifies that at least one Mj rejects. 

It is easy to see that this reduction can be carried out by a deterministic 
logspace algorithm. 

(2) This is an easy reduction from the universality problem of an NFA N with 
alphabet {0, 1}. The latter problem is PSPACE-hard, as shown in Corollary^f 2) 
in the Appendix. Again, the input DTD (<ij n ,s) defines a tree of depth two 
where di n (s) = (0 + 1)*. The tree transducer is the identity transformation. 
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The output DTD d ou t has as start symbol s and d out (s) — L(N). Hence, this 
instance typechecks if and only if {0, 1}* C L(N). 

This reduction can be carried out by a deterministic logspace algorithm. 



4.3 Non-deleting: Fixed Output Schema 

Again, upper bounds carry over from |23| . Also, when the output DTD is 
a DTD(NFA), we can convert it into an equivalent DTD(DFA) in constant 
time. As the ptime typechecking algorithm for TC[7^ d)bc ,DTD(DFA)] in [2"3"| 
also works when the input DTD is a DTD(NFA), we have that the prob- 
lem TC°[7^ dibc ,DTD(NFA)] is in ptime. As the PTiME-hardness proof for 
TC[T nc i f, c ,DTD(DFA)] in [23] uses a fixed output schema, we immediately ob- 
tain the following. 

Theorem 4.6. TC°[T ncM ,DTD{NFA)] is ptime- complete. 

The lower bound in the presence of tree automata will be discussed in Sec- 
tion |0] The case requiring some real work is TC°[T ndjUC , DTD(DFA)]. 

Theorem 4.7. TC°[T nd ^ uc , DTD(DFA)] is PSPACE-complete. 

Proof. In [23], it was shown that the problem is in pspace. We proceed by 
showing PSPACE-hardness. 

We use a LOGSPACE reduction from the corridor tiling problem j2] . Let (T, V, 
H, d, 0) be a tiling system, where T = . . . ,i? fc } is the set of tiles, V C T x T 
and ffCTxT are the sets of vertical and horizontal constraints respectively, 
and i? and /? are the top and bottom row, respectively. Let n be the width of 
d and 0. The tiling system has a solution if there is an m g N such that the 
space m x n (m rows and n columns) can be correctly tiled with the additional 
requirement that the bottom and top row are and *&, respectively. 

We define the input DTD d ln over the alphabet £ := {(i, $j) | j g {1, . . . ,k},i £ 
{1, . . . , n}} U {?'}; r is the start symbol. Define 



where we denote by Si the set {(i, | j g {1, . . . , k}}. Here, # functions as 
a row separator. For all other alphabet symbols a G S, d- m (a) = e. So, d m 
encodes all possible tilings that start and end with the bottom row j3 and the 
top row respectively. 

We now construct a tree transducer B = (Qb, S, q a B , Rb) and an output 
DTD d out such that T has no correct corridor tiling if and only if B typechecks 
with respect to d m and d ut- Intuitively, the transducer and the output DTD 
have to work together to determine errors in input tilings. There can only 
be two types of error: two tiles do not match horizontally or two tiles do not 
match vertically. The main difficulty is that the output DTD is fixed and can, 
therefore, not depend on the tiling system. The transducer is constructed in such 



□ 



□ 
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a way that it prepares in parallel the verification for all horizontal and vertical 
constraints by the output schema. In particular, the transducer outputs specific 
symbols from a fixed set independent of the tiling system allowing the output 
schema to determine whether an error occurred. 

The state set Qb is partitioned into two sets, Qhor and Qver^ 

• Qhor is for the horizontal constraints: for every i G {1, ... ,n — 1} and 
i?£T, qi t $ G Qhor transforms the rows in the tiling such that it is possible 
to check that when position i carries a position i + 1 carries a fl' such 
that G H; and, 

• Q vor is for the vertical constraints: for every i G {1, . . . ,n} and ■& G T, 
Pi,-d £ Qvor transforms the rows in the tiling such that it is possible to check 
that when position i carries a the next row carries a i9' on position i 
such that (tf, G V. 

The tree transducer B always starts its transformation with the rule 

(q° B ,r) -> r(w), 

where w is the concatenation of all of the above states, separated by the delimiter 
$. The other rules are of the following form: 

Horizontal constraints: for all (j, $) G £ add the rule (j, '&')) — > a 
where q-i^ G Qhor and 



trigger if j 



other if j = i and ■& ^ 



a = < 



and = & 



1 and (■&, ?T) G H 
1 and 0?,zT) £ H 



ok if j = 

error if j = 
other if j ^ i and j ^ i + 1 

Finally, (&,,>,#) -> hor. 

The intuition is as follows: if the i-th position in a row is labeled with 
then this position is transformed into trigger. Position i + 1 is trans- 
formed to ok when it carries a tile that matches $ horizontally. Otherwise, 
it is transformed to error. All other symbols are transformed into an 
other. 

On a row, delimited by two hor-symbols, the output DFA rejects if and 
only if there is a trigger immediately followed by an error. When there 
is no trigger, then position i was not labeled with fl. So, the label 
trigger acts as a trigger for the output automaton. 

Vertical constraints: for all (j, -d) G S, add the rule (pi^ , (j, $')) — > a 
where G Q VC r and 

triggerl if (j, i?') = (i, tf) and (1?, j?) G V 

trigger2 if (j, = (i, tf) and (tf, ff) £ V 

ok if j = i, i? ^ i?', and 7?') G V 

error if j = i, ■& ± , and {-&, d') V 

other if j ^ i 
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Finally, (p i)# , #) -> ver. 

The intuition is as follows: if the i-th position in a row is labeled with 
then this position is transformed into triggerl when (i?, ■&) G and to 
trigger2 when #) G" V. Here, both triggerl and trigger2 act as a 
trigger for the output automaton: they mean that position i was labeled 
with i9. But no triggerl and trigger2 can occur in the same transformed 
row as either (9, 9) G V or (9, 9) $ V. When position i is labeled with 
i)' 7^ then we transform this position into ok when G V, and in 

error when ('&,'&') $ V. All other positions are transformed into other. 

The output DFA then works as follows. If a position is labeled triggerl 
then it rejects if there is a trigger2 or a error occurring after the next 
ver. If a position is labeled trigger2, then it rejects if there is a trigger2 
or a error occurring after the next ver. Otherwise, it accepts that row. 

By making use of the delimiters ver and nor, both above described automata 
can be combined into one automaton, taking care of the vertical and the hori- 
zontal constraints. This automaton resets to its initial state whenever it reads 
the delimiter symbol $. Note that the output automaton is defined over the fixed 
alphabet {trigger, triggerl, trigger2, error, ok, other, hor, ver, $}. □ □ 

Although the results in were formulated in the context of variable 
schemas, the proofs for bounded copying, non-deleting tree transducers with 
DTD(SL) and with DTD (DFA) schemas actually used a fixed output schema. 
We can therefore sharpen these results as follows. 

Theorem 4.8. 1. TC°[T ndM , DTD(SL)] is conp -complete; 

2. TC°[T ndM , DTD (DFA)] is PTIME- complete. 

4.4 Non-deleting: Fixed Input and Output Schema 

We turn to the case where both input and output schemas are fixed. The 
following two theorems give us several new tractable cases. 

Theorem 4.9. 1. TC io [T ndtbc , DTD(SL)] is NLOGSPACE-compZeie. 

2 TC lo [T ndM , DTD (DFA)] is NLOGSPACE-complete. 

Proof. For both problems, membership in NLOGSPACE follows from Thcorcm l4.10l 
Indeed, every DTD(SL) can be rewritten into an equivalent DTD(NFA) in con- 
stant time as the input and output schemas are fixed. 

We proceed by showing NLOGSPACE-hardness. We say that an NFA N = 
(Qn,^t$n,In,F]\[) has degree of nondeterminism 2 if (i) In has at most two 
elements and (ii) for every q G Qm and a G S, the set SN{q,a) has at most 
two elements. We give a LOGSPACE reduction from the emptiness problem of an 
NFA with alphabet {0, 1} and a degree of nondeterminism 2 to the typechecking 
problem. According to LemmaODin the Appendix, this problem is NLOGSPACE- 
hard. Intuitively, the input DTD will define all possible strings over alphabet 
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{0,1}. The tree transducer simulates the NFA and outputs "accept" if a compu- 
tation branch accepts, and "error" if a computation branch rejects. The output 
DTD defines trees where all leaves are labeled with "error" . 

More concretely, let N = (Qn, {0, 1}, Sn, {q N }, Fn) be an NFA with degree 
of nondeterminism 2. The input DTD (di n ,r) defines all unary trees, where 
the unique leaf is labeled with a special marker That is, d; n (r) = di n (0) = 
din(l) = (0 + 1 + #) and di n (#) = £■ Note that these languages can be defined 
by SL-formulas or DFAs which are sufficiently small for our purpose. 

Given a tree t — r(a\(- ■ ■ (a n (if)) ■••)), the tree transducer will simulate 
every computation of N on the string a\---a n . The tree transducer T — 
(Qt, {r, #, 0, 1, error, accept}, q T , Rt) simulates N's nondeterminism by copy- 
ing the remainder of the input twice in every step. Formally, Qt is the union 
of {q T } and Qn, and Rt contains the following rules: 

• (q T ,r) — > r(q N ). This rule puts r as the root symbol of the output tree 
and starts the simulation of N. 

• (qN,a) — > a(q N ,q N ), where qN G Qn, a <E {0,1} and 8N(qN,a) = 
{<Z/v>9at}- This rule does the actual simulation of N. By continuing in 
both states q N and q N , we simulate all possible computations of N. 

• (qN,a) — * error if (5jv(qjv,a) = 0- If iV rejects, we output the symbol 
"error" . 

• (SiV, if) — ¥ error for qN £ Fn; and 

• (qn, #) —* accept for qN G Fn- These last two rules verify whether N is 
in an accepting state after reading the entire input string. 

Notice that T outputs the symbol "error" (respectively "accept" ) if and only 
if a computation branch of iV rejects (respectively accepts). 

The output of T is always a tree in which only the symbols "error" and 
"accept" occur at the leaves. The output DTD then needs to verify that only 
the symbol "error" occurs at the leaves. Formally, d out (r) = e2 out (0) = d ou t(l) = 
{0,1, error}" 1 " and dout(error) = e. Again, these languages can be defined by 
sufficiently small SL-formulas or DFAs. 

It is easy to see that the reduction only requires logarithmic space. □ □ 

Theorem 4.10. TC lo [T nd ,uc, DTD (NFA)] is NLOGSPACE-complete. 

Proof. The NLOGSPACE-hardness of the problem follows from Theorem 14. 9f b). 
where it shown that the problem is already NLOGSPACE-hard when DTD(DFA)s 
are used as input and output schema. 

We show that the problem is also in NLOGSPACE. Thereto, let T — (Qt, S, q T , 
Rt) be the tree transducer, and let (d- ln ,r) and d ou t be the input and output 
DTDs, respectively. As both d- m and d ou t are fixed, we can assume without loss 
of generality that they are reduced. 4 For the same reason, we can also assume 
that the NFAs in d- m and d out are determinized. 

4 In general, reducing a DTD (NFA) is PTlME-complete f Section 13. 21 . 
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We guess a sequence of state-label pairs (jpo, do), (pi, ai), ■ ■ ■ , (Pn> a n) where 
n < |Qt||S| such that 

• (p ,a ) = (<?T' r ); and 

• for every pair (pi,Oj), Pi+i occurs in rhs(pi,<Zi) and aj+i occurs in some 
string in L(d; n («»)). 

Each time we guess a new pair in this sequence, we forget the previous one, 
so that we only keep a state, an alphabet symbol, a counter, and the binary 
representation of Qt on tape. 

For simplicity, we write (p n , a n ) as (p, a) in the remainder of the proof. We 
guess a node u G Dom(rhs(p, a)). Let b — \ab lhs ^ p ' a \u) and let ZoqiZi ■ ' ■ Qk^k 
be the concatenation of it's children, where every zq, . . . , Zk G S* and every 
gi, . . . ,qk G Qt, then we want to check whether there exists a string w G di n (a) 
such that zq^i^Jz! • • • qk[w]zk is not accepted by d ont (b). Recall from Section EOl 
that, for a state q G Qt, we denote by q[w] the homomorphic extension of q[c] 
for c G S, which is top(rhs(g, c))) in the case of non-deleting tree transducers. 
We could do this by guessing w one symbol at a time and simulating k copies 
of d ou t(J>) and one copy of di n (a) in parallel, like in the proof of Theorem 14.41 
However, as k is not fixed, the algorithm would use superlogarithmic space. 

So, we need a different approach. To this end, let A = (Qi n , S, 5- m , qf n , F- ln ) 
and B = (Q ut, S, 5 out , g° ut , F out ) be the DFAs accepting d in (a) and d ont (b), 
respectively. To every q G Qt, we associate a function 

fq ■ Qout XS^ Qout : (p',c) H-> (Jout(p',g[c]), 

where <5 ou t denotes the canonical extension of 6 ut to strings in S*. Note that 
there are maximally |Qout|' < ^ out " s ' such functions. Let K be the cardinality of 
the set {f q | g G Qt}- Hence, K is bounded from above by |Q ut|' Qout " s ', 
which is a constant (with respect to the input). Let /i, . . . , fjc an arbitrary 
enumeration of {f q \ q G Qt}- 

The typechecking algorithm continues as follows. We start by writing the 
(1+K-IQoutD-tuple (qf n , q[, . . . , ?[ Qout |, • • • , q'i, ■ ■ ■ , 1\ Qoat \) on ta P e , where Q out = 
{q[, . . . , q',g i}. We will refer to this tuple as the tuple p := (p' , . . . ,p' k .\q j)- 
We explain how we update p when guessing w symbol by symbol. Every time 
when we guess the next symbol c of w, we overwrite the tuple p by 

(*m(po. C), fl(p'i, C), . . . , /l(P|Q out |, C), • ■ • 

• ■ • , fK (p{ K -l). \Q ont |+1 1 C) , • • • , /JC (Pk- I Q out | » C)) ■ 

Notice that there are at most |Q in | -K ■ |Q ou t| 2 different (K ■ |Q out | + l)-tuples of 
this form. We nondeterministically determine when we stop guessing symbols 
of w. 

It now remains to verify whether w was indeed a string such that w G d m {a) 
and Zoqi[w]zi • ■ ■ qk[w]zk $ d out (b). The former condition is easy to test: we 
simply have to test whether p' G Fm- To test the latter condition, we read the 
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string z qizi ■ ■ ■ q^Zk from left to right while performing the following tests. We 
keep a state of d ou t (b) in memory and refer to it as the "current state" . 

1. The initial current state is <Zout- 

2. If the current state is p' and we read Zj, then we change the current state 
to S out (p',Zj). 

3. If the current state is p' and we read qj, then we change the current 
state to p\ in p, where for i, the following condition holds. Let £,m = 
1, . . . , K ■ |Qout| be the smallest integers such that 

• p 1 = q[ in Q out , and 

• fqj /to- 

Then i = (m - l)K + £. 

Note that deciding whether p' = q' e and f q . = f m can be done determinis- 
tically in logarithmic space, as the output schema is fixed. Consequently, 
i can also be computed in constant time and space. 

4. We stop and accept if the current state is a non-accepting state after 
reading Zk- 

□ □ 

Theorem 4.11. TC lo [T ndM ., DTA(DFA)] is EXPTIME- complete. 

Proof. The proof is quite analogous to the proof of Theorem l4.ll As deletion is 
now disallowed, whereas it was allowed in Theorem 14. II we need to define the 
rules of the transducer T — {Qt, £t> <7copy> ^r) differently. 

The language defined by the input schema is exaclty the same as in Theo- 
rem ^3 The transition rules in Rt are defined as follows: 

• (^copyi s ) ¥ s ('Zcopy < ?copy)' 

• (<Zco P y, #) - #(9c°opy9cop y ) for i £ £>(riogn] - 1) - { £ }; 

• (<?copy>#) -» #(startf), where i G D(\\ogn\) - D(\logn] - 1), and i is 
the binary representation of k; 

• (<?co P y: a ) ~> error for a G S and i G D([logn] ); 

• (startf., #) -> crror for all fc = 1, . . . ,2^™! ; 

• (q e , a r ) — > e and (q r , at) — > e for all q G Qj, j = 1, . . . , n; 

• (q £ ,a^) -» an(q[q r 2 ) and (g r ,a r ) -> a r {q{ql), for every g€Q j) t=l,...,n, 
such that 5i(q, a) = <?i<Z2, and a is an internal symbol; 

• (q £ ,ae) — » a^(gf) and (g r ,a r ) — > a r (t^), for every g G Q if i = 1, . . . , n, 
such that (5i(q, a) = qi and a is an internal symbol; 
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• (g^,af) — > e and (q r ,a r ) — > e for every g 6 Q^, i = l,...,n, such that 

a) = e and a is a leaf symbol; and 

• (q e , at) — > error and (q r , a r ) — > error for every g e Qi, i = 1, . . . , n, such 
that di(q,a) is undefined. 

It is straightforward to verify that, on input s(#(#(- • ■ #(£)))), T performs the 
identity transformation if and only if there are [logn] ^-symbols in the input 
and t £ L(Ai) D • • • fl L{A n ). All other outputs contain at least one leaf labeled 
"error" . 

Finally, the output tree automaton accepts all trees with at least one leaf 
that is labeled "error" . So the only counterexamples for typechecking are those 
trees that are accepted by all automata. 

It is easy to see that the reduction can be carried out in deterministic log- 
arithmic space, that T has copying width 2, and that the input and output 
schemas do not depend on A\, . . . , A n . □ □ 

5 Conclusion 

We considered the complexity of typechecking in the presence of fixed input 
and/or output schemas. We have settled an open question in namely that 
TC[T ndtbc , DTA] is EXPTiME-complete. 

In comparison with the results in |23| . fixing input and/or output schemas 
only lowers the complexity in the presence of DTDs and when deletion is disal- 
lowed. Here, we see that the complexity is lowered when 

1. the input schema is fixed, in the case of DTD(SL)s; 

2. the input schema is fixed, in the case of DTD(DFA)s; 

3. the output schema is fixed, in the case of DTD(NFA)s; and 

4. both input and output schema are fixed, in all cases. 

In all of these cases, the complexity of the typechecking problem is in polynomial 
time. 

It is striking, however, that in many cases, the complexity of typechecking 
does not decrease significantly by fixing the input and/or output schema, and 
most cases remain intractable. We have to leave the precise complexity (that 
is, the PTiME-hardness) of TC l [T n d, U c, DTD(SL)] as an open problem. 
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Appendix: Definitions and Basic Results 



The purpose of this Appendix is to prove some lemmas that we use in the 
body of the paper. We first introduce some notations and definitions needed for 
the propositions and proofs further on in this Appendix. We also survey some 
complexity bounds on decisions problems concerning automata that are used 
throughout the paper. 

We show that the complexities of the classical decision problems of string and 
tree automata are preserved when the automata operate over fixed alphabets. 
We will consider the following decision problems for string automata: 

Emptiness: Given an automaton A, is L{A) = 0? 

Universality: Given an automaton A, is L(A) = £*? 

Intersection emptiness: Given the automata A\, . . . ,A n , is L{A\) n ■ •■ fl 
L(A n ) = 0? 

The corresponding decision problems for tree automata are defined analogously. 

We associate to each label a £ £ a unique binary string enc(a) £ {0, 1}* of 
length [log . For a string s = a\ ■ ■ ■ a n , enc(s) = enc(oi) • • • enc(a„). This 
encoding can be extended to string languages in the obvious way. 

We show how to extend the encoding "enc" to trees over alphabet {0, 1, 0', 1'}. 
Here, and 1 are internal labels, while 0' and 1' are leaf labels. Let enc(a) = 
b\ ■ ■ ■ bk for a £ S. Then we denote by tree-enc(a) the unary tree &i(&2(' ■ ■ 
if a is an internal label, and the unary tree 6i(&2( - 1 ■ (b' k )), otherwise. Then, the 
enc-fuction can be extended to trees as follows: for t = a(t\ ■ ■ • i„), 

enc(t) = tree-enc(a)(enc(£i) • • -enc(i„)). 

Note that we abuse notation here. The hedge enc(ii) • • -enc(i„) is intended to 
be the child of the leaf in tree-enc(a). The encoding can be extended to tree 
languages in the obvious way. 

Proposition .1. Let B be a TDBTA. Then there is a TDBTA B' over the 
alphabet {0, 1,0', 1'} such that L(B') — enc(L(B)). Moreover, B' can be con- 
structed from B in LOGSPACE. 

Proof. Let B = (Q B , S s , S b ,F b ) be a TDBTA. Let k := [log We define 

B' = (Q B ',{0, 1, 0', 1'}, 5b',F b >). Set Q B , = {q x | q £ Q B and x is a prefix of 
enc(a), where a £ S^} and F B i = {q e \ q £ F B }. To define the transition func- 
tion, we introduce some notation. For each a £ E and i,j = 1, . . . , [log |£.b|], 
denote by a[i:j] the substring of enc (a) from position i to position j (we ab- 
breviate a[i:i] by a[i]). For each transition S B (q,a) — q 1 q 2 , add the transi- 
tions 6 B >(q £ ,a[l]) = q a[1] , 5 B ,{q a{1] ,a[2]) = g o[1:2 ],. • • ,S B '(q a[1:k -i\,a[k]) = q\q\. 
Other transitions are defined analogously. Clearly, B 1 is a TDBTA, L(B') = 
enc(L(-B)), and B' can be constructed from B in LOGSPACE. □ □ 
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It is straightforward to show that Proposition ^ a l so holds for NFAs and 
DFAs (the proofs are analogous). It is immediate from Proposition^ that lower 
bounds of decision problems for automata over arbitrary alphabets 1291 l3*T] 
carry over to automata working over fixed alphabets. Hence, we obtain the 
following corollary to Proposition FT1 

Corollary .2. Over the alphabet {0, 1}, the following statements hold: 

1. Intersection emptiness of an arbitrary number of DFAs is PSPACE-hard. 

2. Universality of NFAs is PSPACE-hard. 

Over the alphabet {0, 1,0', 1'}, the following statement holds: 

(3) Intersection emptiness of an arbitrary number of TDBTAs is exptime- 
hard. 

Lemma [73*1 now immediately follows from NLOGSPACE-hardness of the reach- 
ability problem on graphs with out-degree 2 |15| . 

Lemma .3. The emptiness problem for an NFA with alphabet {0, 1} degree of 
nondeterminism 2 is NLOGSPACE-/iard. 

We now aim at proving Proposition^] which states that we can find integer 
solutions to arbitrary Boolean combinations of linear (in)equalities in polyno- 
mial time, when the number of variables is fixed. To this end, we revisit a 
lemma that is due to Ferrante and Rackoff |13) . 

First, we need some definitions. We define logical formulas with variables 
X\, X2, • ■ ■ and linear equations with factors in Q. A term is an expression of the 

form ai/bi, ai/b\X\ H \-a n /b n x n , or ai/hxi H \- a n -i/b n - 1 x n -i+ a n /b n 

where a,, bi S N for i = 1, . . . , n. An atomic formula is either the string "true" , 
the string "false", or a formula of the form $i = $2, $1 < $2, or i?i > i?2- 
A formula is built up from atomic formulas using conjunction, disjunction, 
negation, and the symbol 3 in the usual manner. Formulas are interpreted in 
the obvious manner over Q. For instance, the formula ~3xi,X2 {x\ < X2) A 
-1 (3x3 (%i < 13 A13 < X2j) states that for every two different rational numbers, 
there exists a third rational number that lies strictly between them. 

The size of a formula $ is the sum of the number of brackets, Boolean 
connectives, the sizes of the variables, and the sizes of all rational constants 
occurring in Here, we assume that all rational constants are written as a/6, 
where a and b are integers, written in binary notation. We assume that variables 
are written as Xi, where i is written in binary notation. 

Lemma .4 (Lemma 1 in |13|). Let $(a;i, . . . , x n ) be a quantifier- free formula. 
Then there exists a PTIME procedure for obtaining another quantifier-free for- 
mula, &(xi, . . . ,x n —i), such that 

$'(xi, . . . , x n -i) is equivalent to 3x n $(xi, . . . , x n ). 

Proof. Let Q(xi, . . . ,x n ) be a quantifier- free formula. 
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Step 1: Solve for x n in each atomic formula of That is, obtain a quantifier- 
free formula, ^"(xi, . . . , x n ), such that every atomic formula of either 
does not involve x„ or is of the form (i) x n < (ii) x n > d, or (hi) x n = d, 
where i? is a term not involving x n . 

Step 2: We now make the following definitions: 

Given *"(aci, . . . ,x„), to get ^"^(xi, . . . ,x„_i), respectively, ^(xx, . . . 
replace 

x n < i? n in ^ by "true" (respectively, "false"); 

x„ > i? n in \P by "false" (respectively, "true"); and, 

Xn = i?" in $ by "false" (respectively, "false" ) . 

The intuition is that, for any rational numbers r 1; . . . , r n _ 1 , if r is a suffi- 
ciently small rational number, then \t n (ri, . . . , r„_i, r) and ^/"^(ri, . . . , r„_i) 
are equivalent. A similar statement can be made for for r sufficiently 
large. 

Step 3: We will now eliminate the quantifier from 3x n 4 r (xi, . . . ,x n ). Let U 
be the set of all terms $ (not involving x„) such that x n > x„ < 
or x„ = $ is an atomic formula of '5. Lemma 1.1 in |13| then shows that 
3x„4 , (xi, . . . , Xn) is equivalent to the quantifier-free formula $'(xi, . . . , x„_i) 
defined to be 

*!looV*S,v V 

where *™^^' = *"(xi, . . . , x„_i, □ 

□ 

The following proposition is implicit in the work by Ferrante and Rackoff |13| , 
we prove it for completeness: 

Proposition .5. Let $(xi, . . . , x„) be a quantifier-free formula. If n is fixed, 
then satisfiability of $ over Q can be decided in ptime. Moreover, if 4> is 
satisfiable, we can find (vi, . . . , v n ) € Q™ such that . . . , V n ) is true in 

polynomial time. 

Proof. We first show that satisfiability can be decided in ptime. To this end, 
we simply iterate over the three steps in the proof of LemmaQ]until we obtain a 
formula without variables. Hence, in each iteration, one variable xi is eliminated 
from $. For every % = 1, . . . , n, let be the formula obtained after eliminating 
variable Xi. 

Notice that, in each iteration of the algorithm, the number of atomic formulas 
grows quadratically when going from <J>* to However, as there are only a 

constant number of iterations, the number of atomic formulas in the resulting 
formula $ x is still polynomial. Moreover, Ferrante and Rackoff show that the 
absolute value of every integer constant occurring in any rational constant in $ l 
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is at most (so) 14 ™, where so is the largest absolute value of any integer constant 
occurring in any rational constant in <i> (cfr. page 73 in |13|). As n is a fixed 
number, we can decide whether is satisfiable in polynomial time. 

Suppose that $ is satisfiable. We now show how we can construct (vi, . . . , v n ) G 
Q™ in polynomial time such that $(fi, . . . , v n ) is true. For a term & using vari- 
ables xi,...,Xi, we denote by . . . , Vi-\) the rational number obtained by 
replacing the variables Xi, . . . , Xj in $ by v\, . . . , and evaluating the result- 
ing expression. 

For every i = l,...,n, we construct Vi from ^L^, ^J^, and \& , > lS >* 
(which are defined in the proof of Lemma QJ as follows: 



(1) if tfiAtf' i s satisfiable, then ^ 



(2) Otherwise, if Vf^ is satisfiable, then ^ = max{t?(wi, . . . , Uj_i) | X, < 1? or 
Xj > i? or x, = is an atomic formula in 'J 1 } + 1. 

(3) Otherwise, if ^^L^ is satisfiable, then Vi — min.{i9(«i, . . . , Wj_i) | Xi < $ or 
Xj > ?? or Xj = is an atomic formula in — 1. 

It remains to show that we can represent every Vi in a polynomial manner. 

In the proof of Lemma 2 in |13j . Ferrante and Rackoff show that, if Wi is the 
maximum absolute value of any integer occurring in the definition of v± , . . . , i>i , 
then we have the recurrence 

w i+ i < (s ) 2 • (w,) 1 , 

for a constant c and Sq defined as before. Let c' = 2 cn . Hence, the maximum 
number of bits needed to represent the largest integer in Vi+i is log((so) c ■ 
(wi)" 1 ) = c'log(so) • i\og(wi), which is polynomially larger than log(wi), the 
number of bits needed to represent the largest integer in vi . 

As we only have a constant number of iterations, the number of bits needed to 
represent the largest integer occurring in the definition of v n is also polynomial. 
□ □ 

The following proposition is a generalization of a well-known theorem by 
Lenstra which states that there exists a polynomial time algorithm to find an 
integer solution for a conjunction of linear (in)equalities with rational factors 
and a fixed number of variables |18| . 

Proposition .6. There exists a ptime algorithm that decides whether a Boolean 
combination of linear (in) equalities with rational factors and a fixed number of 
variables has an integer solution. 

Proof. Note that we cannot simply put the Boolean combination into disjunctive 
normal form, as this would lead to an exponential increase of its size. 

Let ^(xi, . . . , x n ) be a Boolean combination of formulas <pi, . . . , tp m with 
variables xi, . . . ,x n that range over Z. Here, n is a constant integer greater 
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than zero. Without loss of generality, we can assume that every cpt is of the 
form 

X %1 ~t~ ' ' ' ~t~ ki,n X <En ~l~ ^ 0; 

where fc i; . . . , fc i; „ G Q. 

We describe a PTIME procedure for finding a solution for xi, . . . , x n , that is, 
for finding values v±, . . . , v n G Z such that . . . , v n ) evaluates to true. 

First, we introduce some notation and terminology. For every i = 1, . . . , m, 
we denote by ip^ the formula fc^i x xi + • • • + x #n + = 0. In the 
following, we freely identify ip[ with the hyperplanc it defines in R™. For an 
n-tuple y = {yi, ■ ■ ■ ,y n ) G Q™, we denote by <p-(y) the rational number fc^i x 
j/i H h h.n x y„ + fcj. 

Given a set of hyperplanes H in R™, we say that C C R™ is a cell of if when 

(i) for every hyperplane ip' t in H, and for every pair of points y, z G C, we 
have that i^-(y) if and only if ip'^z) 9 0, where, 6> denotes "<", ">", or 
"=" ; and 

(ii) there exists no C" 3 C with property (i). 

Let H be the set of hyperplanes {</5- | 1 < i < m}. 

We now describe the ptime algorithm. The algorithm iterates over the 
following steps: 

(1) Compute (v[, . . . , v' n ) G Q™ such that <&(X, . . . , v' n ) is true. 5 If no such 
(v[, . . . , v' n ) exists, the algorithm rejects. 

(2) For every G H, let 0i G {<, >, =} be the relation such that 

ki,i xv[-\ h fc i; „ xv'n + h 9i 0. 

For every i = 1, . . . , m, let ip'[ — ki_\ x x\ + ■ ■ ■ + fcj iTJ x x n + fc^ #i 0. So, for 
every i = 1, . . . , m, defines the half-space or hyperplane that contains 
the point (v[, . . . , i^). 
Let $'(xi, . . . , x„) be the conjunction 

A tf- 

l<j<n 

Notice that the points satisfying $'(xi, . . . , x„) are precisely the points in 
the cell C of H that contains (v[, . . . , v' n ). 

(3) Solve the integer programming problem for 3>'(xi, . . . , £„). That is, find a 
(ui, . . . , v n ) G Z™ such that . . . , w„) evaluates to true. 

(4) If (v\, . . . , v n ) G Z™ exists, then write [v\, . . . , w„) to the output and accept. 

5 Note that wc abuse notation here, as the variables in <E> range over Z and not <Q. 
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(5) If (i>i, . . . , v n ) £ Z™ does not exist, then overwrite &(xi, . . . , x n ) with 
...,x n ) = . . .,x n ) A -i$'(a;i, . . . ,£„) 
and go back to step (1). 

We show that the algorithm is correct. Clearly, if the algorithm accepts, $ 
has a solution. Conversely, suppose that 4> has a solution. Hence, the algorithm 
computes a value (v[, . . . ,v' n ) £ Q™ in step (1) of its first iteration. It follows 
from the following two observations that the algorithm accepts: 

(i) If the algorithm computes (v[, . . . , v' n ) £ Q™ in step (1), and the cell C of 
H containing (v[, . . . ,v' n ) also contains a point in Z n , then step (3) finds 
a solution (vi, . . . , v n ) gZ"; and, 

(ii) If the algorithm computes (v[, ... ,v' n ) £ Q n in step (1), and the cell C 
of H containing (v'i, . . . , v' n ) does not contain a point in Z n , then step (3) 
does not find a solution. By construction of $" in step (5), the solutions 
to the formula $" are the solutions of <&, minus the points in C. As C 
did not contain a solution, we have that $ has a solution if and only if 
<3?" has a solution. Moreover, there exists no . . . , w") £ C such that 
3>"(v", . . . , v") evaluates to true. 

To show that the algorithm can be implemented to run in polynomial time, 
we first argue that there are at most a polynomial number of iterations. This 
follows from the observation in step (2) that the points satisfying $'(xi, . . . , x n ) 
are precisely all the points in a cell C of H. Indeed, when we do not find a 
solution to the problem in step (3), we adapt $ to exclude all the points in cell 
C in step (5). Hence, in the following iteration, step (1) cannot find a solution 
in cell C anymore. It follows that the number of iterations is bounded by the 
number of cells in H, which is 0(m n ) (see, e.g. 0, or Theorem 1.3 in ^I] for a 
more recent reference). 

Finally, we argue that every step of the algorithm can be computed in PTIME. 

Step (1) can be solved by the quantifier elimination method of Ferrante 
and Rackoff (Lemma QJ. Proposition ^states that we can find (v[ , . . . , v' n ) in 
polynomial time. 

Step (2) is easily to be seen to be in ptime: we only have to evaluate every 
(f'i once on (v{, . . . ,v' n ). 

Step (3) can be executed in ptime by Lenstra's algorithm for integer linear 
programming with a fixed number of variables 18 . 

Step (4) is in ptime (trivial). 

Step (5) replaces $(a;i, . . . , x n ) by the formula $(a;i, . . . , x n )f\->& (x\, . . . , x n ). 
As the size of &'(xi, . . . , x n ) is bounded by n plus the sum of the sizes of tp" for 
i = 1, . . . , n, the formula $ only grows by a linear term in each iteration. As 
the number of iterations is bounded by a polynomial, the maximum size of $ is 
also bounded by a polynomial. 

It follows that the algoritm is correct, and can be implemented to run in 
polynomial time. □ □ 



35 



Corollary .7. There exists a PTIME algorithm that decides whether a Boolean 
combination of linear (in) equalities with rational factors and a fixed number of 
variables has a solution of positive integers. 

Proof. Given a Boolean combination . . . , x n ) of linear (in)equalities with 

rational factors, we simply apply the algorithm of Proposition^]to the formula 

. . . ,x„) = . . . , x n ) A f\ Xi>0. 

l<i<n 

□ □ 

In the following proposition, we treat the emptiness problem for DTDs: given 
a DTD d, is L(d) = 0? Note that L(d) can be empty even when d is not. For 
instance, the trivial grammar a — » a generates no finite trees. 

Proposition .8. The emptiness problem is (1) PTIME- complete for DTD(NFA) 
and DTD(DFA), and (2) coNP-complete for DTD(SL). 

Proof. (1) The upper bound follows from a reduction to the emptiness problem 
for NTA(NFA)s, which is in ptime (cf. Theorem 19(1) in 

For the lower bound, we reduce from path systems JUj , which is known to 
be PTiME-complete. path systems is the decision problem defined as follows: 
given a finite set of propositions P, a set A C P of axioms, &set R C P x P x P 
of inference rules and some p £ P, is p provable from A using Rl Here, (i) every 
proposition in A is provable from A using R and, (ii) if (j>i,P2,P3) 6 R and if 
Pi and P2 are provable from A using R, then p% is also provable from A using 
R. 

In our reduction, we construct a DTD (d,p) such that (d,p) is not empty if 
and only if p is provable. Concretely, for every (a, 6, c) G R, we add the string 
ab to d(c); for every a € A, d(a) = {e}. Clearly, (d,p) satisfies the requirements. 

(2) We provide an np algorithm to check whether a DTD(SL) (d,r) defines 
a non-empty language. Intuitively, the algorithm computes the set S={a£ 
E | L{{d, a)) ^ 0} in an iterative manner and accepts when r G S. 

Let k be the largest integer occurring in any SL-formula in d. Initially, S is 
empty. 

The iterative step is as follows. Guess a sequence of different symbols 
bi, . . . , b m in S. Then guess a vector (v\, . . . , v m ) 6 {0, . . . , k + l} m , where 
k is the largest integer occurring in any SL-formula in d. Intuitively, the vector 
(vx, ■ ■ ■ , v m ) represents the string b\ x ■ ■ ■ 6^™ . From Lemma l^l it follows that any 
SL-formula in d is satisfiable if and only if it is satisfiable by a string of the form 
a" 1 • ■ • a^™ , where E = {ai, . . . , a n }, and for all i = 1, . . . , n, u, £ {0, . . . , k + 1}. 
Now add to S each a £ E for which • • ■ 6^™ |= d(a). Note that this condition 
can be checked in ptime. Repeat the iterative step at most |E| times and accept 
when r € S. 

The coNP-lowerbound follows from an easy reduction of non-satisfiability. 
Let ^ be a prepositional formula with variables x\ 1 . . . , x n . Let E be the set 
{ai, . . . , a n }. Let (d, r) be the DTD where d(r) — 0', where 4>' is the formula 
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</> in which every Xi is replaced by a" 1 . Hence, (d,r) defines the empty tree 
language if and only if <f> is unsatisfiable. □ □ 

Reducing a grammar is the act of finding an equivalent reduced grammar. 

Corollary .9. Reducing a DTD(NFA) is PTIME- complete; and reducing a DTD(SL) 
is NP-complete. 

Proof. We first show the upper bounds. Let (d, s) be a DTD(NFA) or DTD(SL) 
over alphabet S. In both cases, the algorithm performs the following steps for 
each a e S: 

(i) Test whether a is reachable from s. That is, test whether there is a sequence 
of S-symbols a\ , . . . , a n such that 

• a — s and a n = a; and 

• for every i = 2,...,n, there exists a string w\aiW2 € d(ai-i), for 
tui, W2 G S*. 

(ii) Test whether L{{d,a)) ^ 0. 

Symbols that do not pass test (i) and (ii) are deleted from the alphabet of the 
DTD. Let c be such a deleted symbol. In the case of SL, every atom c- 1 and 
c =l is replaced by true when i = and false otherwise. Further, in the case of 
NFAs, every transition mentioning c is removed. 

In the case of a DTD(NFA), step (i) is in NLOGSPACE and step (ii) is in 
ptime. In the case of a DTD(SL), both tests (i) and (ii) are in np. 

For the lower bound, we argue that 

(1) if there exists an NLOGSPACE-algorithm for reducing a DTD(NFA), then 
emptiness of a DTD(NFA) is in NLOGSPACE; and, 

(2) if there exists a PTiME-algorithm for reducing a DTD(SL), then emptiness 
of a DTD(SL) is in ptime. 

Statements (1) and (2) are easy to show: one only has to observe that an 
emptiness test of a DTD can be obtained by reducing the DTD and verifying 
whether the alphabet of the DTD still contains the start symbol. □ □ 
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